RESEARCH & PRACTICE IN ASSESSMENT 



AUTHORS 

Julie S. Gray, Ph.D. 

The University of 
Texas at Arlington 

Melissa A. Brown, M.Ed. 
The University of 
Texas at Arlington 

John P. Connolly, Ph.D. 
The University of 
Texas at Arlington 


CORRESPONDENCE 

Email 

julie.gray@uta.edu 


*RPA Volume Twelve I Summer 2017 


Abstract 

Data-driven decision making is increasingly viewed as essential in 
a globally competitive society. Initiatives to augment standardized 
testing with performance-based assessment have increased as educators 
progressively respond to mandates for authentic measurement of 
student attainment. To meet this challenge, multidisciplinary rubrics 
were developed as a method of scoring student work samples. The 
current study utilized confirmatory factor analysis to examine ratings 
of student work (N = 245) using the Quantitative Literacy VALUE 
Rubric from the Association of American Colleges and Universities. 

The study examined a conceptual model of the six skill measures 
from the rubric to validate whether, taken together, they are reliable 
measures of a single general construct—Empirical and Quantitative 
Skill (EQS), a Texas Core Curriculum objective. The model confirmed 
that the six measures in the rubric ( Interpretation, Representation, 
Calculation, Application/Analysis, Assumptions, and Communication) 
appeared to describe a single construct. Results support using the 
Quantitative Literacy VALUE Rubric for assessing EQS. 


Examining Construct Validity of the 
Quantitative Literacy VALUE Rubric 
in College-level STEM Assignments 

4 

individual’s quantitative literacy and competence with data evaluation 
is helpful in all areas of life, including academia. Because data-driven decision making is 
increasingly viewed as essential in a globally competitive society, educational objectives 
often emphasize learning outcome elements such as data analysis and how to use the data 
to draw conclusions. Data analysis without an understanding of the implications limits 
appropriate actions that can be taken by individuals and businesses (Tufte, 1997). Further, 
hiring managers seek individuals with empirical and quantitative skills because they have 
the ability to see connections and systemic problems (National Association of Colleges and 
Employers, 2016). Indeed, findings from the Spellings Commission panel stated that, “better 
data about real performance and lifelong working and learning ability is absolutely essential 
if we are to meet national needs and improve institutional performance” (U.S. Department 
of Education, 2006, p. 30). 

Over the past two decades educational policies in the United States were changed 
by congressional legislation (e.g., No Child Left Behind Act [NCLB], 2001; Every Student 
Succeeds Act [ESSA], 2015). McGuinn (2006) maintains that the NCLB was implemented 
in response to public sentiment to hold educators accountable for the instruction students 
receive. More recently, initiatives to augment standardized testing with performance- 
based assessment (PBA) have increased as educators progressively respond to mandates 
for authentic measurement of student attainment. This progression is particularly reflected 
in the recently legislated ESSA (Gewertz, 2015), which is anticipated to go into full effect 
during the 2017-2018 academic year. The next section briefly reviews some of the policy 
implications for assessment professionals. 

Impact of Policy Changes on Assessment Professionals 

NCLB in particular affected the responsibilities of educational assessment 
professionals in requiring that each state must measure student progress for an academic 




RESEARCH & PRACTICE IN ASSESSMENT 


year using single summative tests (Gewertz, 2015). As a result, a reliance upon standardized 
tests quickly developed to assess student attainment and inform process improvements in 
educational service delivery (Supovitz, 2009). Such testing often took the form of high- 
stakes, multiple-choice examinations. However, in the last decade, initiatives to extend 
assessment methods to include performance-based student work have gained momentum at 
many institutions (State Higher Education Executive Officers Association [SHEEO], 2016). 
As ESSA implementation moves toward completion, assessment professionals and state 
officials anticipate that it will provide them with options that include multiple measures 
during an academic year, including merging results from both standardized tests and 
performance-based tests (Gewertz, 2015). While many call the assessment of performance- 
based work a more authentic method of rating student attainment (Montgomery, 2002; 
Peden, Reed, & Wolfe, 2017; Rhodes, 2010; Rhodes & Finley, 2014), efforts to validate the 
way it is rated or scored present challenges for educators (Montgomery, 2002). 

PBA Challenges 

PBA implies that in response to the assignment prompt, a student reveals the skills 
they have attained to date. That is, the student response contains authentic agreement 
between what the student knows and their ability to demonstrate that knowledge (Cobb, 
2014). Unlike standardized tests, PBAs typically consist of written student work samples (e.g., 
essays, experimental or research lab summaries, and presentations). However, while PBA 
holds an advantage of authenticity it also presents a disadvantage. Montgomery (2002) lists 
concerns reported in the literature, including the difficulty of avoiding rater subjectivity when 
scoring authentic student work samples. 

In contrast, normed scores for standardized tests for specific student populations 
typically guide comparisons based on equity and excellence. PBA often requires the introduction 
of a rubric to increase rater objectivity. Use of rubrics adds structure and consistency to the 
performance level assessment and comparisons (Montgomery, 2002). 

VALUE Rubric Development as a Solution-Oriented Assessment Approach 

Indeed, Montgomery (2002) recommended the use of rubrics for assessing authentic 
student work because they are tools that communicate to students the expected elements to 
include in the completed assignment. Rubrics for setting criteria and determining student 
attainment of the target objectives have been suggested to uphold equity and excellence for all 
students (Montgomery, 2002; Peden et al., 2017). That said, an evidence-based approach for 
evaluating PBAs using validated rubrics was needed. 

A campus-based assessment initiative, led by the Association of American Colleges 
and Universities (AAC&U), published 16 Valid Assessment of Learning in Undergraduate 
Education (VALUE) Rubrics (AAC&U, 2017b). Faculty and other educational professionals 
gathered from over 100 different institutions of higher education, under the direction of the 
AAC&U, to develop the rubrics. The VALUE rubrics were designed to be scoring guides that 
can be used by universities to evaluate authentic student work samples. Further, the AAC&U 
outlined four families of Essential Learning Outcomes in order to advance VALUE rubrics as 
relevant assessment tools across a wide range of disciplines, courses, and objectives (National 
Leadership Council for Liberal Education & America’s Promise, 2008). 

These VALUE rubrics serve as a scaffold to government policies that endeavor to 
guarantee the quality of education across the United States for all students (AAC&U, 2017b). 
Though policies vary by state, they broadly included six educational objectives: critical thinking, 
communication, empirical and quantitative skill (EQS), teamwork, social responsibility, 
and personal responsibility. In the southwestern United States, the Texas Higher Education 
Coordinating Board (THECB) adopted the six aforementioned objectives for implementation 
in the most recent revision of the Texas Core Curriculum (TCC; THECB, 2011). The THECB 
required that all two-year and four-year educational institutions submit regular reports 
detailing the assessment practices and results for student TCC objective attainment within 
general education courses that have been approved and designated for inclusion in the TCC 
(THECB, 2011). Decision making regarding methodologies for rating these performance-based 
student work samples was left to the discretion of each institution by the THECB. 


Unlike standardized 
tests, PBAs typically 
consist of written student 
work samples (e.g., 
essays, experimental or 
research lab summaries, 
and presentations). 


The VALUE rubrics 
were designed to be 
scoring guides that can 
be used by universities 
to evaluate authentic 
student work samples. 
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Studies are needed to 
investigate the extent 
to which the measures 
within each VALUE 
Rubric accurately 
represent a single 
construct. 


The current study focused 
on the measures within 
the Quantitative Literacy 
VALUE Rubric and its 
utility for measuring EQS. 


Preliminary studies supported by SHEEO and AAG&U consisted of a collaborative 
effort by 60 institutions in nine states who agreed to test the utility of the VALUE Rubric to 
rate authentic student work (SHEEO, 2016). In 2014-2015, they examined faculty ratings 
of authentic student work to determine levels that indicate healthy thresholds for student 
mastery (Lederman, 2015). While the multi-state collaborative vetted two rubrics in the 
practice of evaluating student work during its initial phase and current studies expanded to 
include more institutions, to date, they have not explored rubric construct validity. Studies 
are needed to investigate the extent to which the measures within each VALUE Rubric 
accurately represent a single construct. 

Importance of Studying the Quantitative Literacy VALUE Rubric 

Case studies document the use of the VALUE Rubrics nationwide (AAG&U, 2017a; 
Peden et al., 2017). This study examines the construct validity of the AAG&U Quantitative 
Literacy VALUE Rubric for evaluating EQS, a TGG objective. EQS allows an individual to 
understand information or raw data that is presented in tables, charts, graphs, or figures and 
evaluate it to draw accurate conclusions. Identifying applications of EQS across academic 
disciplines is straightforward. The ability to take information, analyze it, and predict outcomes 
is a common theme in the hard sciences such as engineering, physics, chemistry, and biology. In 
addition, EQS is utilized across disciplines, for instance, in nursing, business, and psychology. 

Individuals possessing skills such as EQS are in high demand because they can use 
this expertise to find evidence-based solutions. EQS is typically described using action verbs 
including identify, extract, validate, and report (Georgesen, 2015). Further, the process often 
follows an ordered set of action steps. For example, Georgesen (2015) extended the list as a 
set of four steps: 1) define, scope, identify, document; 2) extract, aggregate, transform, create; 
3) develop, analyze, simulate, validate; and 4) report, recommend, implement, monitor. 
The extent to which these verbs can be translated into observable measures is essential to 
evaluating student attainment of the TGG objective EQS. 

The current study focused on the measures within the Quantitative Literacy VALUE 
Rubric and its utility for measuring EQS. The six skill indicators measured by this rubric 
are Interpretation , Representation , Calculation , Application/Analysis, Assumptions , and 
Communication. Explanations for each are contained in the rubric (see Appendix). Our 
hypothesis is that there is a single underlying trait or “latent variable” of which the six different 
skills are indicators. In short, we wish to validate that the six different skills being assessed, 
taken together, are reliable measures of something more general. 

Method 

The skills within the Quantitative Literacy VALUE Rubric were assessed using written 
samples of undergraduate student work from approved Signature Assignments embedded 
in the existing undergraduate TGG courses at a four-year public institution in an urban 
setting. The institution met requirements to serve as a Hispanic Serving Institution by the 
U.S. Department of Education (2016) and, importantly, received the R-l designation by the 
Carnegie Glassification of Institutions of Higher Education (2015), the definitive list for top 
doctoral research. The measurement of student attainment of EQS is of extreme interest 
because of the institutional focus on research. 

Signature Assignments were designed to be authentic performance-based work in 
which students responded to pedagogically relevant prompts. For example, some Signature 
Assignments consisted of written summaries of actual lab experiments conducted by students 
in life and physical sciences courses. These papers, illustrated by tables and figures, essentially 
included measurable elements of Interpretation, Representation, Calculation, Application/ 
Analysis, Assumptions, and Communication. All the Signature Assignments in this sample 
were collected from courses related to science, technology, engineering and math (STEM). 
Trained faculty and staff who participated in calibration and training exercises (described in 
more detail to follow) performed the ratings. 
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Participants 

Signature Assignments were obtained from 296 undergraduates enrolled in core 
curriculum courses in STEM areas at the university. The readability of a portion of the 
assignments (n = 51) was poor because they were scanned copies of handwritten summaries 
from lab books or “blue books.” As such, these 51 Signature Assignments were dropped from 
the sample and not rated. Ratings were available for 245 of the student Signature Assignments. 
Over half of the participants were female (61%; n = 149), which closely matched the gender 
ratio at the university. The sample also reflected a rich diversity of students. About a third 
of the student participants identified as White (33%; n = 80), almost a third identified as 
Hispanic (27%; n = 67), and the balance was split between African American; Asian; foreign, 
nonresident alien; multiple ethnicity; and unknown, not specified. Students represented nine 
of ten colleges and schools at the university (see Table 1). 

Table 1. Student Characteristics for the Rated Sample of Signature Assignments 


Categorical Variables 

N 

% 

Gender 

Female 

149 

60.8 

Male 

96 

39.2 

Ethnic it v 

African American 

33 

13.5 

Asian 

49 

20.0 

Caucasian 

80 

32.7 

Foreign, nonresident alien 

6 

2.4 

Hispanic 

67 

27.3 

Multiple 

5 

2.0 

Unknown, not specified 

5 

2.0 

College/School 

College of Architecture 

2 

0.8 

College of Business 

24 

9.8 

College of Education 

13 

5.3 

College of Engineering 

15 

6.1 

College of Liberal Arts 

26 

10.6 

College of Nursing 

61 

24.9 

College of Science 

61 

24.9 

School of Social Work 

15 

6.1 

Undeclared 

26 

10.6 

a Missing college or school information 

2 

0.8 

Level 

Freshman 

67 

27.3 

Sophomore 

85 

34.7 

Junior 

49 

20.0 

Senior and above 

42 

17.1 

a Missing level information 

2 

0.8 


Note : N = 245 for each of the categorical variable. a Information was missing 


Procedure 

Faculty currently teaching undergraduate courses in STEM areas agreed to submit the 
course set of authentic student work deemed as the Signature Assignment for this study. The 
syllabus for each core curriculum class at the university describes the Signature Assignment 
and the students enrolled in these courses complete it as they would any other assignment 
or required course work. The samples submitted for this assessment process were ungraded, 
de-identified copies. Steps to redact personal and academic information were followed for two 
reasons. The first was to prevent any bias among rater scores in response to the grade the 
paper received from the professor. The second was to protect the confidentiality of student, 
faculty, and course information. 
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EQS allows an individual 
to understand informa¬ 
tion or raw data that 
is presented in tables, 
charts, graphs, or figures 
and evaluate it to draw 
accurate conclusions. 


Assessment Instrument 

The Signature Assignments were assessed using the VALUE Rubric for Quantitative 
Literacy (AAC&U, 2009), which categorizes EQS into six measures: Interpretation, 
Representation, Calculation, Application/Analysis, Assumptions, and Communication. The 
rubric describes each measure and uses a four-point Likert scale for determining scores (see 
Appendix). Higher values indicate more evidence of EQS. Using the rubric, raters assigned a 
score to each of the six skill measures. 

Typically, in student samples, the six measures are adequately represented in the 
narrative of the Signature Assignment. It is important to note that visual communication 
in the form of charts, graphs, and figures enhanced the identification of the Representation 
and Communication measures. This is not unexpected because communication (written 
and visual) is required for fleshing out and articulating ideas in STEM areas. Visual 
communication is particularly important, and in many cases essential, for depicting 
information in STEM areas. 

Raters, Rater Calibration, and Scoring 

For the purposes of this study, the unit of analysis was an individual rater’s score 
for a particular Signature Assignment. Raters scored the student writing samples during a 
scheduled scoring day so each paper was read and then rated by at least two separate raters 
working independently in a group setting. The rater group included ten faculty members 
and professional staff with advanced degrees. Scoring day began with an orientation and 
description of the rating process. Then, the entire group read one anchor paper chosen 
by the facilitator. Next, the facilitator led a discussion focused on reaching a common 
understanding of the EQS measures and finding exemplar indicators within the anchor paper 
for the rubric’s levels of mastery. Then the rating process began and raters individually read 
their assigned papers to score each measure with the rubric (four-point Likert scale). Two 
raters independently rated each paper. Measure scores were calculated as the average of 
both scores. The facilitator checked each paper, after the completion of the two ratings, to 
review whether disagreement between measure ratings exceeded acceptable metrics. If so, 
the facilitator assigned a third rater as a separate, impartial mediator. In those cases (n = 4) 
the outlier of the three ratings was replaced. 

Inter-rater Agreement 

To examine the agreement between raters, an estimate of inter-rater reliability was 
calculated to see how frequently the rater pairs agreed on the score when rating the same paper. 
Conclusions about the consistent measurement of the six measures depend on this estimate. 
A calculation of the intraclass correlation coefficient (ICC) was used to determine the level of 
inter-rater agreement. High ICC values indicate more agreement between raters. A one-way 
random model was used to measure consistency within the mean measure values. ICC values 
for Interpretation, Representation, Calculation , Application/Analysis, Assumptions, and 
Communication indicated good inter-rater agreement (see Table 2) even though rater pairs 
varied across ratings, which typically results in lower ICC values (Landers, 2015). 


Table 2. ICC Values by Measure 


Measure 

ICC Value 

Interpretation 

.52 

Representation 

.51 

Calculation 

.47 

Application/Analysis 

.56 

Assumptions 

.51 

Communication 

.60 


Note : N - 245 for each measure. 
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Figure 1 . Conceptual Model of Underlying EQS Traits 

Analysis Plan 

We used confirmatory factor analysis to assess whether the six measured skills Signature Assignments 

are reliable indicators of an underlying more general construct (Brown 2006). One key were designed to be 

advantage of this approach is the ability to isolate the underlying construct from random authentic performance- 
error variance in the indicator measures. Further, correlations across the error components based work in which 
of each survey item can also be modeled to account for method effects that detract from students responded to 

the underlying construct, such as any tendency to rate two of the skills more similarly than pedagogically relevant 

the others. Figure 1 depicts the conceptual model (H 0 ). prompts. 

Because the measure ratings are in the form of a Likert scale, and therefore categorical, 
we used a mean- and variance-adjusted weighted least squares (WLSMV) estimator to estimate 
the loadings of each measure on the underlying EQS trait (Muthen & Muthen, 1998-2012). 

The same estimator also yields fit statistics that provide information on the overall 
reliability of the model in terms of its ability to reproduce the variances and covariances of 
the indicator measures. Ideally, the model reports a nonsignificant chi square value indicating 
that imposing the hypothesized structure on the data does not amount to a substantial loss 
of information. However, since chi-square statistics are proportional to sample size other 
statistics are commonly used to assess model fit. In particular, a Root Mean Square Error of 
Approximation (RMSEA) statistic that is below 0.05 and a Comparative Fit Index (CFI) greater 
than 0.95 indicates a model that is a good fit to the data (Byrne, 2012). 

Results 

All the analyses were conducted in Mplus v.7.31 (Muthen & Muthen, 2012), which 
also reports ways of improving the model via modification indices. Analysis of the set of 
ratings from rater 1 and then the set from rater 2 (from the rater parings) indicated that • 
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Current efforts toward 
the use of PBA to 
augment standardized 
testing with students 
present a challenge for 
educators because of 
the possible rater bias 
and other differences 
in scoring authentic 
student work; thus, there 
is a need to validate the 
rubrics that raters use. 


significant model improvement would be obtained by allowing the random error variances in 
the Representation and Calculation measures to correlate. The fit statistics of the two models, 
i.e., the model with the specified error correlation (the H 1 model) and the model with no error 
correlations (the H 0 model), are summarized in Table 3. The H 1 model met all the criteria of a 
well-fitting model in terms of the key fit statistics: chi square, RMSEA, and CFI. The table also 
showed a significant loss of fit for the H 0 model in terms of a chi-square difference test. 

The unstandardized loadings of each of the six skill measures on the underlying EQS 
latent variable are summarized in Table 4. The standardized estimates, along with associated 
standard errors, are shown in Figure 2. Also included in Figure 2 is the estimate for the error 
correlation between Representation and Calculation. 


Table 3. Model fit statistics for the H j and H () models with X 2 difference test 



N 

z 2 

df 

P-Value 

RMSEA 

CFI 

Hi Model 

245 

9.31 

8 

0.317 

0.03 

0.99 

Ho Model 

245 

47.31 

9 

0.000 

0.13 

0.98 

Difference Test 


19.40 

1 

0.000 




The estimates in Table 4 are akin to regression estimates of the effect of the 
underlying EQS trait on the skill in question—all of which were statistically significant at the 
0.01 alpha level. The three strongest indicators were Communication, Application/Analysis, 
and Interpretation , and the amount of variance in these indicators explained by EQS is 77%, 
73%, and 71%, respectively. Weaker effects were found in the case of Calculation (53%), 
Assumptions (47%), and Representation (42%). 

Table 4. Weighted Least Squares estimates for the six skill measures 



Estimate 

S.E. 

Est./S.E. 

Two-Tailed P-Value 

Interpretation* 1 

1.00 

0.00 



Representation 

0.77 

0.07 

11.20 

0.000 

Calculation 

0.87 

0.07 

12.72 

0.000 

Application/Analysis 

1.02 

0.06 

17.62 

0.000 

Assumptions 

0.81 

0.06 

12.91 

0.000 

Communication 

1.04 

0.06 

18.98 

0.000 


Note : Tor the purpose of scaling the latent ariable, Interpretation is treated as the marker 
indicator. As such, the associated loading of this indicator on EQS is set at a value of one 
(Brown, 2006, p.71). 


Discussion 

Current efforts toward the use of PBA to augment standardized testing with students 
present a challenge for educators because of the possible rater bias and other differences in 
scoring authentic student work; thus, there is a need to validate the rubrics that raters use. The 
goal of the current study was to examine the construct validity of the Quantitative Literacy 
VALUE Rubric, one of 16 rubrics developed by the AAC&U. Overall, the findings show that 
the six underlying skill measures tapped into a common underlying EQS trait. These results 
extend previous research that has primarily focused on the use of the rubrics to study trends 
in student attainment (SHEEO, 2016). 

Summary of Findings 

Our hypothesis-testing results suggested that the six measures each reflect EQS as 
an underlying trait and that raters using the rubric produced valid EQS scores. Significant 
consistency was confirmed by analyzing rubric ratings of authentic student work from 
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Figure 2. Standardized estimates for final solution (PQ 


STEM courses at a four-year public university. Each of the six measured skills loaded on the 
same construct and the model accounted for a large proportion of variance in each of the 
indicators. This validates that the six different skills assessed by the Quantitative Literacy 
VALUE Rubric are reliable measures of the general trait, EQS. Though the importance of 
nonsubjective measures of PBA has been well established, to our knowledge this is the first 
study that confirmed how well the measured skills in the Quantitative Literacy VALUE 
Rubric fit together as a model of EQS. 

In addition, the pattern of results indicated three measures with very strong 
contributions to the model, Interpretation , Application/Analysis, and Communication. These 
three skill measures are widely used in statistical texts to describe the analytical process 
researchers use after research questions are posed, studies are designed, and data are collected. 
Without them, the research process is just a collection of numbers, and does not contribute 
answers to research questions that often have real consequence in many fields. Indeed, 
national surveys of employers repeatedly list skills involving Interpretation , Application , 
and Communication as essential qualities in job applicants (National Association of Colleges 
and Employers, 2016). The model confirmed the strength of the rubric in representing these 
highly marketable skills—those that are involved in quantitative literacy. 


The goal of the current 
study was to examine 
the construct validity of 
the Quantitative Literacy 
VALUE Rubric, one of 16 
rubrics developed by the 
AAC&U. 


In further support for the model, analyses revealed inter-rater reliability estimates in 
the moderate to good range for the six measures. This suggests that rater calibration activities 
conducted on scoring day may have held a degree of utility in terms of promoting agreement 
among raters. The literature about VALUE rubrics contains many case studies of the use of 
calibration as a best practice (AAC&U, 2017a; Finley, 2011; Peden et al., 2017) yet, to our 
knowledge, it does not contain findings related to calibration activity effectiveness that directly 
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This validates that the six 
different skills assessed 
by the Quantitative 
Literacy VALUE Rubric 
are reliable measures of 
the general trait, EQS. 


Continued efforts are 
needed to promote the 
use of authentic student 
work in educational 
assessment. 
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compared a trained group of raters with a group that did not undergo any sort of training. 

In addition, while inter-rater agreement may have differed with the introduction 
of more than two raters for all Signature Assignments, the study design accounted for the 
importance of good inter-rater agreement by planning the facilitator-led calibration activities 
and using a third rater to mediate unacceptable differences. Indeed, Stanny, Gonzalez, 
and McGowan (2015) mention improvement in rater agreement through the use of similar 
activities that operationalize rubric guidelines with “notes [added to the rubric] about difficult 
decisions, to build and maintain consensus for future decisions” (p. 905). Further, Finley 
(2011) recommends that rating sessions include the type of facilitator-led discussions that 
were used in this study before the application of the rubric to ensure adequate agreement. 
Though not a primary focus of the current study, findings suggested that the level of agreement 
for the ratings in the sample provided adequate justification for proceeding with the analysis 
of the rubric’s construct validity. 

In addition to strengths already mentioned, the model improved when the association 
between Calculation and Representation was allowed to covary. This makes sense because 
a single-minded focus on Calculation makes drawing conclusions hard to visualize and 
a skill such as Representation strengthens its meaning. In that way, Calculation and 
Representation dovetail together. In practice, calculation turns to representation to derive 
meaning and understanding as two parts of the same whole. In the process of problem solving, 
making a visual representation is a natural process for deriving meaning from computational 
problems (Van Garderen & Montague, 2003) and for enhancing the decision-making value of 
quantitative information (Tufte, 1997). 

Limitations 

The findings of the current study are promising but a few limitations should be 
noted. For instance, student samples only represented STEM courses in the life and physical 
sciences. This limited the ability to examine the independent effects of other types of courses 
and potential confounds. In future studies, course types should be extended to include all 
three of the foundational component areas required by the THEGB (life and physical science, 
mathematics, and social and behavioral science). Though all students at the university were 
also required to take courses across eight foundational component areas as part of the TGG, 
conclusions would be strengthened through the incorporation of a wider range of courses. 
Additionally, performance-based work was gathered only from TGG-approved courses and the 
naturalistic design of the study did not allow for randomized assignment of papers from across 
all the STEM courses on campus regardless of level. Nonetheless, the student demographics 
suggest that the sample was consistent with the campus population as a whole. 

Conclusion 

Continued efforts are needed to promote the use of authentic student work in 
educational assessment. This study examined a widely utilized rubric using a relatively large 
sample of STEM assignments to capitalize on the strength of the AAC&U initiatives that 
measure student attainment of broadly accepted educational learning objectives. Results 
suggest that the six skill measures contained in the Quantitative Literacy VALUE Rubric 
fit together well to explain EQS. Consequently, efforts to promote VALUE rubrics have the 
potential to accurately measure student attainment of EQS. Further research is needed to 
confirm the construct validity of the full array of AAG&U VALUE Rubrics. Continuation of this 
line of inquiry is essential for maximizing the effectiveness of PBA. 


Keywords: quantitative literacy, empirical and quantitative skill, VALUE rubric, STEM, EQS, 
performance-based assessment, Texas Gore Curriculum, AAG&U 
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